@emotion circuits

mentions 1 type Person feed RSS

23:35

2026-06-12

lesswrong.com

large-language-models

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

A new analysis argues that emotion vectors in large language models may serve AI-native functions like reward hacking, with no human analog, challenging anthropocentric emotion labels and raising alig…

// co-occurs with top 7 entities

Wang et al. 1 Anthropic 1 LLM 1 emotion vectors 1 reward hacking 1 mechanistic interpretability 1 alignment 1